The Scottish Qualifications Authority owns the copyright to its exam papers and marking instructions.
Hints offered by N Hopley
Click here to start/reset.
Paper 1
Question 1
1a) Hint 1: note the two aspects that you have to comment on: location and spread
1a) Hint 2: be sure to use the boxplot on the left that relates to grams of fat, and not the boxplot on the right that relates to calories
1b) Hint 3: make reference to both upper and lower fences
1c) Hint 4: know that the outliers that we are interested in here are below the lower fence, which is Q1 - 1.5 × (Q3 - Q1)
1c) Hint 5: be sure to clearly communicate that the two outliers are numerically less than the value of the lower fence
1d) Hint 6: think of the process that would have had to happen in order to gather and record the data, and what might have gone wrong somewhere in this process
1e) Hint 7: know that a t-test is generally used when a population standard deviation is not known
1e) Hint 8: know that in order to use a t-test, we therefore need to use the sample standard deviation as an estimate of the population standard deviation
1e) Hint 9: look at Output 1 to see whether the required information is there, or not.
1f) Hint 10: take the time to write out explicitly in words the two null hypotheses
1f) Hint 11: be clear that the hypotheses must refer to the appropriate populations' means
1g) Hint 12: know that a p-value for a two tailed test is 2 × probability of obtaining the test statistic, or a more extreme value
1g) Hint 13: here, the required p-value = 2 × P(t75 > 1.0496)
1g) Hint 14: know that a t75 distribution can be approximated by the standard normal distribution, Z
1g) Hint 15: know that to use Table 3 in the Statistical Formulae and Tables booklet, we have to round 1.0496 to 1.05
1g) Hint 16: know that P(Z > 1.05) = 1 - P(Z < 1.05)
1h) Hint 17: when no direction is provided, it is the convention to use a significance level of 5%
1h) Hint 18: make sure to comment in terms of the impact to the mean calorie intake
1i) Hint 19: note that there are several coffee shop chains in the UK, and only one was selected for this sample
1i) Hint 20: note that from the single coffee shop chain, a (simple) random sample of bakery and non-bakery items was taken
1i) Hint 21: so we have two stages of random selection, and thus this is two stage cluster sampling
1i) Hint 22: make sure to decribe how a single coffee shop chain might have been selected
1i) Hint 23: make sure to describe how a simple random sample of bakery/non-bakery items might have been conducted
Question 2
2a)i) Hint 1: notice that as crop density increases, so does crop yield
2a)i) Hint 2: this suggests a positive relationship
2a)i) Hint 3: looking at the positioning of the points, a positively correlated linear relationship between the two variables is very plausible
2a)ii) Hint 4: know that a residual = observed value - fitted value
2a)ii) Hint 5: know that a residual is a measure of the error in the observed data that is not explained by the linear model (that gives the fitted value)
2a)iii) Hint 6: the residuals do have a mean of zero, so that is not the reason
2a)iii) Hint 7: the residuals do have a constant variance, so that is also not the reason (see the next hint for the reason behind this)
2a)iii) Hint 8: reason for constant variance: looking from left to right, taking two or three points at a time, the 'spread' of those groups of points is roughly the same
2a)iii) Hint 9: the only remaining reason for the residual plot not being acceptable is the non-random pattern, that has a 'U' shape
2a)iii) Hint 10: this U shape tells us that the linear model starts off underestimating the crop yield, then over estimating it , then underestimating it again
2a)iii) Hint 11: hence transforming the data before fitting a new linear model would hopefully give more consistent estimates of the crop yield for different crop densities
2b)i) Hint 12: accept that you will use the usual formulae for this process, replacing the notation of y with √y
2b)i) Hint 13: calculate 'b' using Sx√y and Sxx
2b)i) Hint 14: obtain the mean of √y, using Σ√y and n = 13
2b)i) Hint 15: obtain x̄ using Σx and n = 13
2b)i) Hint 16: calculate 'a' using the previously obtained values
2b)i) Hint 17: be sure to write your linear regression equation in terms of √y = ...
2b)ii) Hint 18: substitute the value of x = 3.5 into your regression equation to obtain the fitted value for √(crop yield)
2b)ii) Hint 19: use your knowledge from part (a)(ii) to calculate the residual for this data point
2b)ii) Hint 20: use the fitted value for the √(crop yield) and the residual value to locate the required point on the residual plot
2c)i) Hint 21: look at just the crop density values in Figure 1 and Figure 2
2c)i) Hint 22: know that a linear model is only potentially valid for values that were encompassed by the original data set
2c)ii) Hint 23: read lines 7 to 9 of the report
2c)ii) Hint 24: recognise that a linear equation will never reveal a maximum value, by the nature of it being linear
2c)ii) Hint 25: know that the transformation of data from (crop yield) to √(crop yield) will only 'slightly curve the line' and still not give a maximum
2c)iii) Hint 26: recognise that the report did not fit crop yield, but rather √(crop yield)
2c)iii) Hint 27: recognise that the report did not cover all crop densities, only those between 2 and 8 plants/m²
Paper 2
Question 1
1a) Hint 1: define a random variable, X, for the height of one boy, and state its distribution and its parameters
1a) Hint 2: calculate P(X > 111) using either tables, or graphic calculator
1b) Hint 3: define a new random variable, X̄, for the mean height of 25 boys, and state its distribution and its parameters
1b) Hint 4: calculate P(X̄ > 111) using either tables, or graphic calculator
1c) Hint 5: look at the parameters of distributions for X and X̄ and see how they are different
Question 2
Hint 1: know that the assumption required by the Wilcoxon Signed Rank test is that the distributions used are each symmetrical
Hint 2: make sure to mention the distribution of steps is symmetrical (you must always include the context)
Hint 3: calculate the differences between the recorded values of steps, and the number 300
Hint 4: looking at the absolute values of these differences, rank the |differences|
Hint 5: notice that there is one pair of equal values of |differences|, and this will affect the values of their ranks
Hint 6: calculate the sum of the ranks for the positive differences and/or the negative differences, whichever is going to be the smaller
Hint 7: re-read the final sentence of the question to decide whether this is a one-tailed or two-tailed hypothesis test
Hint 8: use Table 7 of the Statistical Formulae and Tables booklet, to obtain the appropriate one-tailed critical value
Hint 9: decide whether to reject H0 or not to reject H0
Hint 10: communicate what this evidence suggests, making sure to mention 'median number of steps' (you must always include the context)
Hint 11: communicate clearly what this means in terms of whether the mobile phone over-counts the number of steps, or not.
Question 3
3a) Hint 1: define a random variable, X, for the number of blood donors with blood type B-
3a) Hint 2: determine what the distribution of X will be, along with its parameters
3a) Hint 3: calculate P(X ≥ 2) using either tables, or graphic calculator
3b) Hint 4: define a random variable, Y, for the number of blood donors with blood type O+ or O-
3b) Hint 5: calculate the combined probability of having either O+ or O- blood
3b) Hint 6: determine what the distribution of Y will be, along with its parameters
3b) Hint 7: with Y ∼ B(50, 0.504), we now approximate it with a normal distribution
3b) Hint 8: calculate the mean and the variance of the normal distribution (checking that np >5 and nq >5, as it's a good habit to do)
3b) Hint 9: if W is the normal approximation to Y, then W ∼ N(25.2, 12.4992)
3b) Hint 10: know that P(Y ≤ 30) = P(W ≤ 30.5) due to continuity correction
3b) Hint 11: calculate P(W ≤ 30.5) using either tables or a graphic calculator
Question 4
Hint 1: a standard chi-squared goodness of fit question...
Hint 2: make sure the null hypothesis references the specified ratio, or similar
Hint 3: calculate the expected frequencies by using the ratios 1:1:2 and the total sample size of 320
Hint 4: know that the degrees of freedom = categories - constraints
Hint 5: we have 3 categories here, and only 1 constraint (i.e. the sum of the frequencies must equal 320)
Hint 6: obtain the value of the test statistic, X²
Hint 7: obtain either critical values from the Data Booklet, or p-value from a graphic calculator
Hint 8: decide whether or not to reject H0
Hint 9: make sure that final statement includes the context of the problem
Question 5
5a)i) Hint 1: know that for the distribution of a random variable to be valid, all of its probabilities must sum to 1
5a)i) Hint 2: so P(X = 4) = 1 - P(X ≤ 3)
5a)i) Hint 3: after calculating an expression for P(X = 4) in terms of p, proceed with calculating E(X) in the normal manner
5a)ii) Hint 4: using E(X) = 3, calculate the value of p using E(X) = 4 - 16p, from part (a)(i)
5a)ii) Hint 5: proceed to calculate V(X) in the normal manner
5b) Hint 6: as Y ∼ Po(1), write down E(Y) and V(Y)
5b) Hint 7: use laws of expectation and variance to calculate E(K) and V(K)
5b) Hint 8: remember that V(aY) = a²V(Y)
5b) Hint 9: remember to state the value of SD(K)
Question 6
Hint 1: recognise that we have a single sample of 75 baby lengths
Hint 2: recognise that we don't have the baby lengths population standard deviation
Hint 3: know that we shall have to estimate the population standard deviation from the sample standard deviation
Hint 4: this all suggests that a single sample t-test is required
Hint 5: however, we are told that the sample standard deviation is a good estimate of the population standard deviation, and we would have a t74 distribution, which can be approximated with a Z distribution, and so a single sample z-test is now the appropriate choice, going forward
Hint 6: calculate the sample mean, x̄, from Σx and n = 75
Hint 7: calculate the sample standard deviation, s, using the formula on page 4 of the Statistical Formulae and Tables booklet
Hint 8: state your hypotheses in terms of the population mean
Hint 9: define X and X̄, using all of the data so far gathered
Hint 10: using the Z distribution, calculate the test statistic, or the p-value, for the sample mean
Hint 11: decide whether to reject H0 or not to reject H0
Hint 12: communicate what this evidence suggests, making sure to mention 'mean baby length' (you must always include the context)
Hint 13: write a clear comment on the midwife's theory
Question 7
7a) Hint 1: draw a tree diagram!
7a) Hint 2: your tree diagram should have a first set of branches with 'jam', 'cheese', 'tuna' with the second set of branches being 'water', 'lemonade' and the third set of branches being 'apple', 'banana'
7a) Hint 3: recognise that P(tuna ∩ water) = 0.035
7a) Hint 4: know that P(water | tuna) = P(tuna ∩ water) ÷ P(tuna)
7b) Hint 5: recognise that we need to know either P(banana) or P(apple)
7b) Hint 6: use P(cheese ∩ banana) to help obtain P(banana)
7b) Hint 7: know that P(cheese ∩ banana) = P(cheese) × P(banana) as they are independent events
7b) Hint 8: notice that 'fruit being an apple' is the complementary event to 'fruit being a banana'
7b) Hint 9: know that P(jam ∩ apple) = P(jame) × P(apple) as they are independent events
Question 8
Hint 1: recognise that this is a hypothesis test on ρ, the population correlation coefficient
Hint 2: use the formulae from the Data Booklet to calculate the test statistic, t, using n and r
Hint 3: the number of degrees of freedom is two less than the sample size (due to it being bivariate data)
Hint 4: note that you are conducting a two-tailed test
Hint 5: decide whether to reject H0 or not to reject H0, remembering that 0.1% can be written as 0.001
Hint 6: clearly communicate your conclusion, citing the context of the problem.
Question 9
9a) Hint 1: recognise that you have paired data
9a) Hint 2: recognise that we do not know the population standard deviation
9a) Hint 3: hence we are going to perform a t-test for the mean difference in populations
9a) Hint 4: state your hypotheses in terms of the mean of the differences
9a) Hint 5: decide whether it is a one-tailed or two-tailed test being performed
9a) Hint 6: calculate the test statistic, using x̄, sn-1 and n
9a) Hint 7: determine the number of degrees of freedom for the t distribution
9a) Hint 8: obtain a critical value, or calculate a p-value
9a) Hint 9: decide whether to reject H0 or not to reject H0
9a) Hint 10: clearly communicate your conclusion, citing the context of the problem.
9b)i) Hint 11: comment on whether the histogram's shape is one that looks like a normal distribution
9b)ii)) Hint 12: know that a Wilcoxon Signed Rank test is also designed for paired data
9b)ii)) Hint 13: think about the assumption required for this test and whether the histogram provides any evidence that supports that assumption being valid
Question 10
Hint 1: recognise that you are given data on proportions, so a proportion test is the chosen test to perform
Hint 2: decide whether we have the difference in two population proportions, or a single sample proportion
Hint 3: state your hypotheses in terms of the population proportion, p.
Hint 4: decide whether it is a one-tailed or two-tailed test being performed
Hint 5: calculate the sample proportion test statistic, p̂ using the numbers 23312 and 37878
Hint 6: for the model, define X to be the number of homeless veterans in sheltered accommodation in 2018
Hint 7: determine the distribution of X, and its parameters
Hint 8: this distribution will first be approximated to a normal distribution
Hint 9: approximate the X ∼ B(n,p) into a N(np, npq) distribution
Hint 10: create a new random variable to represent the proportion of homeless veterans in sheltered accommodation in 2018
Hint 11: determine the parameters of the normal distribution of this new random variable which will model the proportions
Hint 12: using p̂, obtain a critical value, or calculate a p-value
Hint 13: decide whether to reject H0 or not to reject H0
Hint 14: clearly communicate your conclusion, citing the context of the problem.
Question 11
Hint 1: recognise that we are looking to set up some equations to allow us to calculate μ and σ
Hint 2: this suggests two simultaneous equations being formed, as we have two unknown variables
Hint 3: know that one equation will come from using P(X > 24) = 0.05
Hint 4: know that the second equation will come from using P(X < 17)=0.10
Hint 5: use the inverse normal tables/function to obtain z-values corresponding to cumulative probabilities of 0.10 and 0.95
Hint 6: assemble the information of 17, μ, σ and -1.28155 into an equation
Hint 7: assemble the information of 24, μ, σ and 1.64485 into an equation
Hint 8: solve these simultaneous equations
Question 12
12a) Hint 1: know that the formula for a 99% CI is p̂ ± z0.995 √ (p̂q̂/n)
12a) Hint 2: substitute all of the correct values into this formula to obtain the confidence interval
12a) Hint 3: know that the origins of this formula come from a binomial distribution being approximated by a normal distribution
12a) Hint 4: hence the process of approximating one distribution with another inherently introduces added uncertainty
12b) Hint 5: recognise that we want the lower bound of the confidence interval to be greater than 0.50
12b) Hint 6: this means that we want p̂ - z0.995 √ (p̂q̂/n) > 0.50
12b) Hint 7: rearrange this inequality to make n the subject
12b) Hint 8: evaluate the inequality to obtain a minimum value for n
12b) Hint 9: know to interpret the value for n, bearing in mind the context of the problem